Integrated Sequence-Structure Motifs Suffice to Identify microRNA Precursors
نویسندگان
چکیده
BACKGROUND Upwards of 1200 miRNA loci have hitherto been annotated in the human genome. The specific features defining a miRNA precursor and deciding its recognition and subsequent processing are not yet exhaustively described and miRNA loci can thus not be computationally identified with sufficient confidence. RESULTS We rendered pre-miRNA and non-pre-miRNA hairpins as strings of integrated sequence-structure information, and used the software Teiresias to identify sequence-structure motifs (ss-motifs) of variable length in these data sets. Using only ss-motifs as features in a Support Vector Machine (SVM) algorithm for pre-miRNA identification achieved 99.2% specificity and 97.6% sensitivity on a human test data set, which is comparable to previously published algorithms employing combinations of sequence-structure and additional features. Further analysis of the ss-motif information contents revealed strongly significant deviations from those of the respective training sets, revealing important potential clues as to how the sequence and structural information of RNA hairpins are utilized by the miRNA processing apparatus. CONCLUSION Integrated sequence-structure motifs of variable length apparently capture nearly all information required to distinguish miRNA precursors from other stem-loop structures.
منابع مشابه
Structure analysis of microRNA precursors.
MicroRNA biogenesis occurs in several steps from their precursors having irregular hairpin structures. The highly variable architecture of these stem-and-loop structures, which have terminal loops of various sizes and diverse structure destabilizing motifs present in their stem sections, may strongly influence the process of microRNA liberation. In order to better understand this process, more ...
متن کاملDifferential Repression of Alternative Transcripts: A Screen for miRNA Targets
Alternative polyadenylation sites produce transcript isoforms with 3' untranslated regions (UTRs) of different lengths. If a microRNA (miRNA) target is present in the UTR, then only those target-containing isoforms should be sensitive to control by a cognate miRNA. We carried out a systematic examination of 3' UTRs containing multiple poly(A) sites and putative miRNA targets. Based on expressed...
متن کاملFinding local RNA motifs using covariance models
We present DISCO, an algorithm to detect conserved motifs in sets of unaligned RNA sequences. Our algorithm uses covariance models (CM) to represent motifs. We introduce a novel approach to initialise a CM using pairwise and multiple sequence alignment. The CM is then iteratively refined. We tested our algorithm on 26 data sets derived from Rfam seed alignments of microRNA (miRNA) precursors an...
متن کاملRegmex, Motif analysis in ranked lists of sequences
Motif analysis has long been an important method to characterize biological functionality and the current growth of sequencing-based genomics experiments further extends its potential. These diverse experiments often generate sequence lists ranked by some functional property. There is therefore a growing need for motif analysis methods that can exploit this coupled data structure and be tailore...
متن کاملThe roles of EPIYA sequence to perturb the cellular signaling pathways and cancer risk
Abstract It was shown that several pathogenic bacterial effector proteins contain the Glu-Pro-Ile-Tyr-Ala (EPIYA) or a similar sequence. These bacterial EPIYA effectors are delivered into host cell via type III or IV secretion system, where they undergo tyrosine phosphorylation at the EPIYA sequences, which triggers interaction with multiple host cell SH2 domain-containing proteins and thereby...
متن کامل